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Abstract — Linear Finite State Machines (LFSMs) are particu- 
lar primitives widely used in information theory, coding theory 
and cryptography. Among those linear automata, a particular 
case of study is Linear Feedback Shift Registers (LFSRs) used 
in many cryptographic applications such as design of stream 
ciphers or pseudo-random generation. LFSRs could be seen as 
particular LFSMs without inputs. 

In this paper, we first recall the description of LFSMs using 
traditional matrices representation. Then, we introduce a new 
matrices representation with polynomial fractional coefficients. 
This new representation leads to sparse representations and 
implementations. As direct applications, we focus our work on 
the Windmill LFSRs case, used for example in the EO stream 
cipher and on other general applications that use this new 
representation. 

In a second part, a new design criterion called diffusion delay 
for LFSRs is introduced and well compared with existing related 
notions. This criterion represents the diffusion capacity of an 
LFSR. Thus, using the matrices representation, we present a 
new algorithm to randomly pick LFSRs with good properties 
(including the new one) and sparse descriptions dedicated to 
hardware and software designs. We present some examples of 
LFSRs generated using our algorithm to show the relevance of 
our approach. 

Index Terms — LFSM, LFSR, m-sequences. 

I. Introduction 

Linear Finite State Machines (LFSMs) are a building block 
of many information theory based applications such as syn- 
chronization codes, masking or scrambling codes. They are 
also used for white noise signals in communication systems, 
signal sets in CDMA (Code Division Multiple Access) com- 
munications, key stream generators in stream cipher cryp- 
tosystems, random number generators in many cryptographic 
primitive algorithms, and as testing vectors in hardware design. 

A Linear Finite State Machine is a linear automaton com- 
posed of memories defined over a particular finite set A (typi- 
cally a finite field) and where the only operation updating cells 
is the addition HI, 0, 0. At each clock, it inputs n elements 
of A and outputs at least one element computed using its 
current state and a linear updating function based on additions. 
Two main classes of LFSMs could be defined: autonomous 
(without inputs in the updating process) and non-autonomous. 
This paper first recalls the traditional representation using 
transition matrices which is classically used to characterize 
autonomous and non-autonomous LFSMs. Then, it introduces 
a new fractional representation using rational powers series, 
i.e. the series are the quotient of two polynomials. Our new 
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model is called Rational Linear Finite State Machines (RLF- 
SMs) and is a generalization of the previous matrices represen- 
tations. We present the link between the two approaches. As 
a particular case of study of our new representation, we focus 
on windmill LFSRs defined by Smeets and Chambers in 0. 
Those LFSRs are based upon particular polynomials producing 
in parallel v subsequences of a given LFSR sequence. Four 
windmill generators are used as parallel updating functions 
in the stream cipher EO [5]. The windmill constructions have 
been first extended in 0. In this paper, we show how we 
could, using the new rational representation, give a simple 
expression of those particular constructions and how this new 
theoretical representation could lead to clearly simplify the 
usual representation of circuits with multiple outputs at each 
iteration or parallelized versions of LFSRs. 

In a second step, we also introduce a new criterion for 
LFSMs to measure what we call diffusion delay. We compare 
this new criterion with the existing notions of auto-, cross- 
and simple correlations and show how this criterion captures 
an intrinsic behavior of the automaton itself. LFSMs are 
popular automata in many cryptographic applications and 
are particularly used as updating functions of stream ciphers 
and of pseudo-random generators. Their large popularity is 
due to their very simple design efficient both in hardware 
and in software and to the proved properties of the gener- 
ated sequence (statistical properties, good periods,...) if the 
associated polynomial is primitive. In many cryptographic 
applications, the diffusion delay of LFSMs is most of the 
time not considered. In this paper, we focus on this criterion, 
show its link with correlation and its effectiveness for several 
types of automata such as FCSRs or NLFSRs. We also give a 
new algorithm to construct hardware and/or software efficient 
LFSMs with good diffusion delay called Ring LFSRs. For the 
hardware case, we show theoretical bounds on the number 
of gates required to implement a ring LFSR compared with 
the traditional Galois and Fibonacci LFSRs and we compare 
the associated traditional properties. For the software case, 
we compare the properties and the performances of our Ring 
LFSR with the LFSR involved in the stream ciphers SNOW 
v2.0 Q, finalist of the NESSIE project 0. 

This paper is organized as follows: Section [TT] gives some 
background about Finite State Machines (FSMs) and in- 
troduces notations. Section [ill] presents previous works on 
LFSMs. Section [TV] introduces the new rational representation 
for LFSMs, detailing some examples of Windmill LFSRs and 
of general applications. Section [VI presents the new diffusion 
delay criterion, shows why this criterion captures new notions 
and proposes hardware and software oriented implementations 
with respect to this criterion. Finally, Section[VT]concludes this 
paper. 



2 



A. Notations 

The finite field with cardinal q is denoted ¥ q . We denote 
F 9 [X] the ring of polynomials and F g [[X]] the ring of power 
series, both over ¥ q . We will also use in Sections [IV] and 
followings, the ring Q of rational power series, that is the 
ring of power series which can be written P(X) /Q(X) where 
P, Q G ¥ q [X] with Q(0) ^ 0. We will recall in Theorem O 
that Q is the ring of power series that correspond to eventually 
periodic sequences. 

We will also use the notation Mk,l(ft.) for the ring of 
matrices with k rows and I columns over a ring 1Z. For 
convenience and not to make notations too heavy, we often 
write vectors v as rows v = (vi, . . . ,v n ) but also use them 
as column vectors in expressions such as Av where A is a 
matrix. Of course the correct form should be with explicit 
transposition as in A f v but we expect the reader not to be 
confused with this abuse of notation. 

In Section [V] we will use the notation Wjj for the Hamming 
weight. For example, the Hamming weight of a matrix is 
its number of nonzero entries. The Hamming weight of a 
polynomial is its number of non null coefficients. 

II. Background 

A. Linear recurring sequences 

As the case of binary sequences is the most useful in 
pseudo-random generation, we deal in this paper with the two 
elements field F 2 . However most of the results presented here 
have a straightforward generalization when using another finite 
field as base field. 

Recall that a sequence s — (sj)j £ N over F 2 is a linear 
recurring sequence if there exists q\, . . . ,qd G ¥2 such that 
s n = qiS n -% + ■ ■ ■ + qdSn-d for all n > d. A binary sequence 
(sj)ieN can be seen as a power series s(X) — J^iLo s iX l - In 
terms of power series, we have the following Theorem Q): 

Theorem 2.1: Let s = (sj)jeN be a sequence over F2. The 
following statements are equivalent: 

> The sequence s is a linear recurring sequence. 

> The sequence s is eventually periodic, i.e. there exists 
N G N such that (si)i>N is periodic. 

> There exist polynomials f(X),g(X) € F 2 [X] with 
g(0) = 1 such that the power series f(X)/g(X) is equal 

Moreover, s is periodic if and only if f(X) and g(X) are such 
that dcg/ < degg. 

According to this Theorem a correspondence can be built 
between rational power series and sequences. The period of 
a linear recurring sequence is determined by the polyno- 
mial g(X) as shown by the following Theorem Q]: 

Theorem 2.2: Let s(X) — f(X)/g(X) be a rational power 
series, with gcd(f(x),g(x)) = 1, We denote by s the sequence 
of coefficients of s(X). 

* The period of s is equal to the order of X in 
¥ 2 [X]/{g(X)). 

• If g(X) is primitive then there exists N G N such that 
Ei>iv SiX*-" = l/g(X). 

When the polynomial g(X) is primitive, the sequence s has 
period 2 dcg9 — 1 and is called a m-sequence. 



B. Adjunct matrix 

Let M = {Tni,j)i<i,j<n be a square matrix over a ring 1Z. 
The (z, j)-th cofactor Cjj of M is (— times the deter- 
minant of the matrix obtained by removing the line i and the 
column j in M. The transpose of the cofactor matrix (cij) is 
called the adjunct matrix of M and we denote it by adj(Af). 
The adjunct of M has its coefficients in 1Z and satisfies the 
following identity 

adj(M)M = M adj(M) = det(M)7. (1) 

Hence, if det(Af) is invertible, we have = 

HI. LFSMs 

A. Definitions 

LFSMs (Linear Feedback State Machines) have been stud- 
ied in J9), 01) El) ED- They are a generalization of Linear 
Feedback Shift Registers, for which the shift structure is 
removed, i.e. each cell has no privileged neighbor. Let us give 
a definition of an LFSM (over F 2 ): 

Definition 3.1: A Linear Finite State Machine (LFSM) C, 
of length n, with k inputs and £ outputs consists of: 

> A set of n cells, each of them storing a value in F 2 . The 
content of the cells, a binary vector of length n, will be 
denoted m = (mo, . . . , m n _i) and is called the state of 
the LFSM. We will sometimes call the set of these n cells 
the register. 

« A transition function which is a linear function from F 2 x 

¥\ to F' 2 \ 

• An extraction function which is a linear function from 
¥% to ¥{. 

The behavior of an LFSM is described below: 

1 The register is initialized to a state G F' 2 at time 
t <- 0. 

2 The extraction function is used to compute an output 
vector v(t) G F 2 from the state 

3 A new state m( t+1 ) is computed from the current 
state m" and from a vector e F 2 input at time t 
using the transition function. This new state is stored in 
the register. 

4 Execution continues by going back to Step 2, with t 
t + 1. 

An LFSM is a kind of finite state automaton, for which 
the set of states is F 2 and the transition function is linear. 
However, an additional function gives the ability to output 
data. An LFSM is also different from a finite state automaton 
because the transition function may depend also of an input 
vector. Note also that an LFSM does not terminate as it has 
no final state. 

A given LFSM can be entirely specified by a triplet of F 2 - 
matrices (A, B, C), of respective sizes nx n, nx k and £ x n, 
which describe the transition and extraction functions in the 
following way. Given a state column vector G F 2 and an 
input column vector ti' 4 ' G F*, the next state vector m^ +1 ^ 
and the present output vector v^ 1 € F 2 are expressed by: 

to ( * +1) = Am {t) +Bu {t \ (2) 
«(*) = Cm (t) . (3) 



3 



For suitable matrices A,B,C, we will denote £(A,B,C) 
an LFSM with transition and extraction functions given by 
Equations 12 and |3] For short, we will often call A the transition 
matrix of C (even when B / 0) while in fact the transition 
function depends on both A and B. 

The polynomial defined now plays an important role in the 
theory of LFSMs: 

Definition 3.2: Let C = (A, B, C) be an LFSM. The 
polynomial det(7 — XA) is called the connection polynomial 
of C. We will denoted it Qc{X) or simply Q(X). 

Note that Q(X) € ¥ 2 [X] has degree at most n (with 
equality iff det(A) ^ 0). Moreover, Q(0) = 1, hence Q(X) 
has an inverse in the ring Fg^-X - ]] of power series. More 
precisely, Q(X)^ 1 is in Q. 

B. Sequences obtained from an LFSM 

For each t € N, an LFSM outputs a vector «W = 
(v[ , . . . , ) of £ bits. For each i = 1, . . . ,£, we will denote 
Vi(to) = 2~2T vf^^X 1 the power series obtained from 
the sequence {vf )t>t - We also define as the vector 

(Vi(to), . . . , Vg(to)) of power series. We consider also the 
series MiOo) = Et°° m i 0+t) X* obtained from the sequence 
observed in each cell rrii (for 1 < i < n), and the vector 

(*o) — (Mi(to)j • • • j M n (to)) of power series. In a similar 
way, we define U^°> = (Ui(t ), . . . , E7fc(t Q )) from the input 
sequences. 

The sequences M;(to) observed in the register, and the 
output sequences Vi(to) satisfy interesting linear relations 
(cf. m, |9), 0). We provide these relations in the next 
theorem. 

Theorem 3.3: Let C = (A, B, C) be an LFSM. The vectors 
M(*°) and V"^ ' verify: 

r M(t0 ) = ^z_M (m (*o) +XjBC/ (t )) 

] ^(to) = c adj(/-xA) (m(to) + XBuito)y 

Proof: For each f e N, we multiply Equation [2] and 
Equation [3] by X 1 and sum each of them over t. We get 

M (to+1) = AA/ (to) + B[/ (to) (4) 
V (to) = CM [to) . (5) 

But M$°> = m ( * o) +XM {tQ+1 \ Hence, with Equation @] we 
obtain 

M ( * o) = X(AA/ (to) + BU {ta) ) + m {ta) 

or also (7 - Xi)M( f ») = XBU^ + m^°\ By Equation Q] 
we obtain the first relation of Theorem 13.31 The second one 
follows from Equation [5] ■ 

Note that, as mentioned before, l/Qc(X) is a power series. 
So the expression given for in Theorem 13.31 does not 

(in general) belong to F 2 [X] but to F 2 [[X]], even if the input 
U is of finite degree. 

Note also that, when the LFSM C has no input (or more 
generally when the input U has finite degree), Theorem 13.31 
gives expressions for and as quotients of two 

polynomials, and so belong to Q, the ring of rational power 
series. 



C. Autonomous LFSMs 

An important particular case of LFSMs is the one for which 
the transition function does not depend on some input, that is 
to say B = 0. Such an LFSM will be called an autonomous 
LFSM. The following Theorem shows that some polynomials 
Pi (for 1 < i < n) related to the components m, of the state 
are divided by X modulo Q(X) at each clock cycle. 

Theorem 3.4: Let C be an autonomous LFSM and put 
pW = adj(7 - XA)m® (for t G N). The relation Xp^ 1 ^ = 
modulo Q(X) holds, for each t. 

Proof: From Equation|2] we have Xm^ t+1 ^ — XArrft' = 
-(I- XA)mW + Multiplication by adj(7 - XA) gives 
= -Q(l)mW +pW. " " m 

D. Similar LFSMs 

Two LFSMs defined by two distinct triples (A, B,C) and 
(A', B', C) may produce the same output. This is the case of 
similar LFSMs, which were defined in 0, |9l . 

Definition 3.5: Given two LFSMs £ = (A, B, C) and £' = 
(A' , B' ,C). C and C are said similar if there exists a non- 
singular matrix P over F 2 such that: 

A' = P^AP, B' = P^B, C = CP. 

The matrix P is called the change basis matrix from C to £ . 

Theorem 3.6: Let C and C be two similar LFSMs. Assume 
that their initial state vectors satisfy m'^ ' = P _1 m^ ^ and that 
they have same input = U'( '). Then: 

1) Both LFSMs C and C have same connection polyno- 
mial. 

2) M'<W = P-'M' '. In particular, m'W = P- l m® 
holds for each t > 0. 

3) The sequences output by C and £ are equal: V'W = 
W 0) . In particular, i/ (t) = w (t) holds for each t > 0. 

Proq/: 

1) The first claim results from det(J — XA') = dct(I — 
A"P _1 AP) = dct(p- 1 (7 - XA)P) = det(J - XA). 

2) Let's prove the second claim by recurrence. If 
m'^ = P _1 m < - t ' for some t, then Equation |2] 
gives p- 1 mP+ 1 ^ = P^Am® + P^Bu^ = 

p-*APm'® + P^Bu^ = A'm'W + B'u'W = 
m '(t+i). 

3) Finally, using Equation [3] w''*- 1 = C'm'^ — 
CPP^m^ = Cm (t) = w (t) . This proves the last 
claim. 



E. Classical families of autonomous LFSMs 

Different special cases of LFSMs, are well-known for 
years and have been extensively studied, with some variations 
of terminology among different scientific communities, for 
example the theoretic and electronic communities as (9), 10, 
ifTTI and the cryptographic community as 02, 03, El, iflOll . 
We gather in this subsection some of these special cases, using 
notations consistent with the one we used above. 

The most famous LFSMs special cases are: 
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Fig. 1 . Transition matrices of Galois and Fibonacci LFSRs with connection 
polynomial Q(X) = q n X n H h qiX + 1 
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Fig. 2. Implementation of Galois and Fibonacci LFSRs with connection 
polynomial Q(X) = q n X n H h qiX + 1 
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(b) Implementation of a CA 

Fig. 3. Transition matrix and implementation of a 3-neighborhood Cellular 
Automaton 
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States of Co, Ci and Ci during 8 clocks. 



• the Fibonacci Linear Feedback Shift Registers, also 
known as External-XOR LFSR, or just LFSR; 

• the Galois Linear Feedback Shift Registers, also known 
as Internal-XOR LFSR, or Canonical LFSR. 

A Galois or Fibonacci LFSR is defined by its connection 
polynomial because the transition matrix A has a special form 
and can be deduced from it. The matrices B and C are 
simple because LFSR have no input and because they output 
a single bit. The transition matrices for Galois and Fibonacci 
are shown in Figure Q] Figure [2] presents the corresponding 
implementations . 

It can be shown that the matrices Tp and Tq given in 
Figure [T] are similar matrices (because they are "transposed 
with respect to the second diagonal" one from each other). 
Hence, the Galois and Fibonacci LFSRs with same connection 
polynomial are similar LFSMs in the sense of Definition 13.51 

Another special kind of LFSMs is the 3-neighborhood 
cellular automaton (CA) |QT|, El, D2), El- These automata 
are characterized by a tri-diagonal matrix as presented in 
Figure [3] They are suitable for hardware implementation. 

To cover numerous kind of automata presented in 0, ifPTl . 
[ 16 1, 1 18 1, we introduce Ring LFSRs. The cells which store the 
state are organized in a cyclic shift register. This corresponds 
to a transition matrix of a particular form: 

Definition 3.7: An LFSM C with transition matrix A is 
called a Ring Linear Feedback Shift Register if A = 



{a<i,j)o<i,j<n as the following form: 

/ = 1 for all < i < n — 1 

\ Ctn-1,0 = 1 

i.e., 

/I (*) \ 




1 

V J 

In particular, Galois and Fibonacci LFSRs are special cases 
of Ring LFSRs. 

We detail here a complete example of these automata. 
Consider the primitive connection polynomial Q(X) = X 8 + 
X e + X 5 + X 3 + l. Denote C the associated Galois LFSR, d 
the associated Fibonacci LFSR and £2 a generic Ring LFSR 
with connection polynomial Q(X). We present their respective 
transition matrices To, T\ and T2 in Figure |4] Figure [5] shows 
the implementation of Cq, Ci and £2 whereas Table [Qdisplays 
the states of these automata during 8 clocks starting from the 
same initial state. 

The reader can see that from the same initial state 
00000001 the output sequences are distinct. However, 
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Fig. 5. Three LFSR with connection polynomial Q(X) = X s + X 6 
X 5 + X 3 + l 



they are all a part of the same m-sequence defined by 
Q(X) = X s + X 6 + X 5 + X 3 + 1 according to Theo- 
rem 13.31 In other words there exists three different polyno- 
mials Pq(X), Pi(X), P2(X) of degrees less than 8 such that 
the sequences generated by £ > £i an d £2 are respectively 
Po(X)/Q(X), P 1 (X)/Q(X) and P 2 (X)/Q(X). 

IV. Rational representation 

In this section, we will introduce a generalization of LFSRs 
and LFSMs by extending the set of possible coefficients for 
the transition matrix to rational fractions. This new approach 
is not only of theoretical interest, but is also an interesting tool 
for both having a more global view of complex circuits and 



for constructing more complex circuits from smaller LFSMs 
with nice properties. Each coefficient of such a matrix is a 
rational fraction which represents a small LFSM. The inputs 
and outputs of each small LFSM are thus used as a part of 
the full automaton. 

This new representation allows an easier description of 
complex circuits with small internal components such as the 
so-called Windmill generators |4). These generators are for 
example used in the stream cipher E0 [5| implemented in the 
Bluetooth system. 

This rational representation leads to a simpler representation 
of some circuits with multiple outputs at each iteration or of 
parallelized versions of LFSRs. 

This section is organized as follows: we first focus our 
analysis on LFSMs with a single input and a single output. 
Then we introduce the notion of transition matrix with rational 
coefficients. We demonstrate that the automata built using this 
new representation essentially produce the same sequences 
than the classical LFSRs. We give a first example based on 
this new representation to construct a filtered LFSR automaton. 
We then focus our work on the case of Windmill generators 
and give a simpler and more compact definition of such 
LFSRs. We thus discuss the difficulty of implementing such 
automata which is not so easy in the general case. Finally, we 
conclude this section with a concrete example. It consists in a 
generalization of Windmill generators that allows to construct 
complex circuits from simpler well designed circuits. These 
simple circuits are building blocks of a bigger automaton 
which connects the small components in a circular way. The 
full circuit inherits good internal properties of the smaller 
components. 

A. LFSMs with a single input and a single output 

As a building block for our representation, we are first 
interested by an LFSM with a single input bit and a single 
output bit. In this situation, the matrix B is a n x 1 matrix, 
with a single 1 in position iq. Likewise, C is a 1 x n matrix, 
with a single 1 in position jo. 

Set A' = a,d](I-XA) = (A' ij (X)), where the coefficients 
A' id (X) are polynomials, and Q(X) = dct(I-X.A). We can 
derive from Theorem 13.31 the following relation between the 
input series U^' and the output series V^: 



X 



Q(x) 

Note that CA'B 



CA'BU (t) 



1 



Q(x) 



CA'mM 



A'j o i(j (X) is a polynomial, and 



PW (X) = CA'mM is also a polynomial. Setting R(X) 
XA^ (X), we can rewrite the previous formula 



Q(x) 



u 



(*) 



PW(X) 

~~Q{xT 



Note that R(X) is independent of the internal state my' of 
is uniquely determined by the internal 



the LFSM, and 
state m (t) of the LFSM. 

So up to initial internal values of such LFSM, we can 
consider that it performs the multiplication of the input by 
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Fig. 6. Implementation of a division/multiplication circuit 

the rational series R(X)/Q(X) (note that, since Q(X) = 
det(7 - X.A), we have Q(0) = 1^0). 

Conversely, for a given rational power series R{X)/Q{X), 
Q(0) ^ 0, it is possible to construct many LFSMs which 
perform the multiplication by R(X)/Q(X). 

As an example of such LFSMs, we give in Figure [6] an 
LFSM with one input and one output which performs the 
multiplication by R(X) / Q(X) called in the rest of this paper 
a Galois vane (in reference to a Galois LFSR and a vane of a 
windmill generator). 

The matrix description of this LFSM is: 
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A = 



and C = (1,0,..., 0). 

it will be interesting to use some multiplication/division 
circuits which are not performed by a Galois vane. As an 
example, we consider the ring LFSR described in Figure [5] 
The connection polynomial is Q(X) = X 8 +X e +X 5 +X 3 +l. 
Let 7" = adj I - XT 2 , we have T' ll = X e + X 3 + 1 and 
= X 7 + X 5 + X A + X 2 . For '*B = C = (1,0, 0), 
this ring LFSR performs the multiplication by (X 6 + X 3 + 
1)/(X S + X 6 + X 5 + X 3 + 1). For l B = (0,0,0,1,0,0,0,0) 
and C = (0, 0, 1, 0, 0), it performs the multiplication by 
(X 7 + X 5 + X 4 + X 2 )/{X S + X 6 + X 5 + X 3 + 1). For these 
two examples, the circuit is simpler than the equivalent one 
obtained by the Galois vane. 



B. Rational Linear Machines 

Now, we want to use multiplications by rational power series 
R(X)/Q(X), with Q(0) ^ 0, as internal building blocks in 
order to construct bigger LFSMs. 

Recall that we denote by Q the ring of rational power 
series, that is {P{X)/Q{X) e V 2 [[X}] | P{X),Q{X) e 
¥ 2 [X],Q(0)^0}. 

Definition 4.1: A Rational Linear Machine (RLM) C with 
fc-bit input, £-bit output and length n over Q is a triplet of 
matrices (A, B, C) over Q, of respective sizes n x n, n x k, 
£xn. Given the current state vector (m^ , c' 4 ') € -M„,i(F 2 )x 
M-n,\{Q) and input vector u^' € A^fe,i(F 2 ). The next state 
vector (m,( t+1 \ c^ t+1 - 1 ) and the present output vector «M £ 
■Me,i(^2) are expressed as: 

m( t+1 ' = AmW + cW + Bu^ mod X 



where P(X) div X = 

As previously we are able to describe the output sequences: 
Theorem 4.2: Let £ = (A, B, C) a RLM. The vector M<*) 

satisfy the relation: 



; ) = (J — AM)- 1 fm« + Xc< f ) + XBC/W) 



Proof: With the previous notations we have the following 
relations: 

M {t+i) = AM (t) + c (t) + B{/ (t) (6) 

AfW = !Af (i+1) +mW (7) 

Equation |6] is by Definition 14.11 Equation [7] comes from the 
Definition of Afw. It leads to the following relation: 

(7 - AA)Af ( * o) = m ( * o) + Xc (to) + XBU {t) 

Note that (I - XA) is invertible in M n {Q). This leads to 
M<*°) = (7 - X^)- 1 ^^ ) + Xc< to ) + XBUW) in Q. ■ 

C. Rational Linear Finite State Machines 

In order to focus the attention on some applications, and 
for a better understanding of the significance of Theorem 14. 21 
we focus in this Section on the study of RLM with no input. 
Moreover, we will try to limit the domain of the "carries" 
register c in order to ensure that the machine is a finite state 
machine. We suppose in the sequel that B = 0, i.e. there is 
no input. 

In order to restrict RLM to finite state machines, we have 
to look at the evolution of "internal memories" c^' in more 
details. Let Af t j — Pij(X) /Qij(X) be the expression of a 
coefficient of the matrix i as a quotient of two polynomials. 
For a fixed row i we can compute the polynomial Qi(X) = 
lcm(Qj i x(X), . . . , Qi n (X)). So we can normalize the ra- 
tional representations as follows: Aij — Rij(X)/Qi(X). 
For each row i we define the following finite subset of Q: 
Wi = {R{X)/Qi{X) | dfigORpQ) < max^degOR^X)))}. 
Finally we define W = n" =1 W t C Q". Note that W is 
a finite set. The following proposition shows that it is a 
"reasonable" set for the values of the internal memories; 

Proposition 4.3: Suppose that at time to, c'* ' is in W, then 
for any t > t , c" is in W. 

Proof: Let ^ t+1 '> = Am^ + c™, From the definition 
of a RLM, we have to<* +1 ) = mod X and c< t+1 ) = 

^ t+1 )divX. 



If we consider the i-th row of A, we obtain pL 



^(t+i) 



AmW +cW divX 



Tij=i r nf ) R i ,j(X)/Q i (X) + cY'. So under the condition 

C{ € Wi, jtii* +1 ^ can be expressed as a rational fraction of the 
form R[/Qi and deg(R' i ) < ma.Xj(deg(Rij(X)), this implies 
c («+i) e p^. H 

Following this result we want to limit the "carries" part of 
a RLM to the domain W. So we give the following definition 
for RLFSMs, which is a true finite state machine. 

Definition 4.4: A Rational Linear Finite State Machine 
(RLFSM) with £-bit output and length n over Q is a finite 
state automaton defined by a pair (A, C) of matrices over Q , 
with respective sizes nxn and £xn. The space of states of this 
automaton is F2 x W where W is defined from A as previously 
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explained, the transition and extraction functions at time t are 



(t) M) 



) at time 




defined by: if the automaton is in the state (m 
t and is the output at time t, then 



4mW + c<^ mod X 
Am® +cW divX 



Now, we want to characterize in more details the output of 
a RLFSM. Set G(X) = ^=1 Qi( x )- We have A = q^A', 
where A' is a matrix with polynomial coefficients. 

From the definition of A', we have det(J — XA) = 
deb(G(X)I - XA') where T(X) = det(G(X)I - 
XA') is a polynomial. So we obtain (7 — XA)~ l = 
G j^x) ac U(^ ~ XA'), where adj(7 — XA') is a matrix with 
polynomial coefficients. 

We can easily deduce the rational form of the output of a 
RLFSM 

Proposition 4.5: Let £ be a RLFSM defined by a tran- 
sition matrix A and any output matrix C. Set T(X) = 
det(G(X)(I — XA)). The output sequences V^*' are rational 
power series of the form Pi(X)/T(X). 

Proof: This result comes from the formula 



M« = [I-XA)- x {mW +Xc«) 
= fgladjg-XA'XmW- 



Indeed, the denominators of the coefficients of the matrix (I— 
XA) -1 are some divisors of is a binary vector 

and cW € W is such that G(X) n Xc® is a polynomial vector. 



Note that the rational power series Pi(X)/T(X) are a 
priori not irreducible. In practice, the numerator is often the 
polynomial Q(X) such that Q(X)/ P(X) is the irreducible 
rational representation of det(7 — XA). 




output 



T 



Let B = (I - XA')- 1 , and Q(X) = det(I - XA') = 
X 12 + X 11 + X 9 + X 7 + X 6 + X 5 + 1. Then the value of B 



B 



Q(x) 



(p,, 

P3.1 



Pl,2 
Pl,2 
P%,1 
* 4,2 



P h3 
P2,3 
Ps,3 
A,3 



P2.4 

P3A 
PiA J 



with Pi,i(X) = 1, Pi. 2 {X) = X 5 , Pi, 3 (X) = X 7 , 
Pi. 2 {X) = X 9 , P 2 .i(X) = X 7 + X 6 + X A + X 2 + X, 
P 2 ,2(X) = X 5 + 1, P 2 . 3 (X) = X 7 + X 6 + X 5 + 1, 
P2a(X) = X 9 + X 8 + X 7 + X 2 , P 3 ,i(X) =x 5 + x 4 + x 2 , 
P 3 ,2(X) = X w +X 9 + X 7 , P 3 , 3 (X) = X 7 + X 6 + X 5 + l, 
+ X 8 + X 7 + X 2 , Pi,i(X) = X 3 + X 2 , 
X 7 , Pi, 3 {X) = X w + X 9 and 
X 7 + X 6 + X 5 + 1. 
If we denote by (ao, . . . , a± 2 ) the initial state at time t = 
of the binary LFSR, then, the initial state of our RLFSM is 



P 3A (X) = X 9 
Pi,2(X) =X 8 A 
P iA {X)=X 9 A 



AO) 



(a Q ,a 5l a 7 ,a 9 ) and c (0) ) 



[<2l 



a 2 X + a 3 X 2 + 



CI4X 3 , a,Q, a§, aio + flu-X") and the sequences in output are 

aoPl,l(X)+a 5 P lt2 (X)+a 7 P 1 .3(X)+a a P 1 . 4 (X) 

, (a 1 +a 2 X+a 3 X 2 +a4.X- i )X 

Q(X) 

aoP2,l(X)+a 5 P2,2(X)+a 7 P 2 . 3 {X)+a 9 P 2 . i (X)+a 6 X 
Q(X) 

aoP3,liX)+a 5 P 3i2 (X)+a 7 P 3 .3(X)+a 9 P 3A (X)+a s X 
Q(X) 

aaP i ,l{X)+a 5 Pi i2 (X)+a 7 P i ,3{X)+a a P AA (X) + {a la +a 11 )X 
Q(X) 



D. A first example 

We consider a filtered LFSR in Galois mode of size n = 12 
with connection polynomial Q(X) = 1 + X 5 + X 6 + X 7 + 
X 9 + X 11 + X 12 , filtered by a Boolean function in cells mo, 

7715, 7717 and 7779. 



T 



output ■* 

If we are interested only on the filtered output bits, this 
LFSR can be described by a RLFSM with the matrix 



A' = 



( X 4 X 4 

1 + X 

X 

\X + X 2 



0\ 

X 

X 

0/ 



E. Application to windmill LFSRs 

Windmill LFSRs can be defined as LFSMs with no input 
and several outputs. They have been introduced in 0] as 
a cyclic cascade connection of v > 1 LFSMs. Each of 
these LFSMs is called a vane of the windmill. The classical 
representation of those LFSMs is the Fibonacci one. However, 
in the rest of this section, we will show them using the 
equivalent Galois representation because it is more suitable for 
a better understanding. Windmill LFSRs are characterized by 
their feedback and feedforward connections. These feedback 
and feedforward connections are identical for all vanes, but 
the lengths of the LFSMs may be different as they can be 
shifted in different LFSMs. Figure [6] presents a generic vane 
in Galois mode. 

Windmill LFSRs were introduced to achieve parallel gener- 
ation of sequences. Consider a sequence S = (s n )„ 6 N- While 
a classical automaton outputs So at the first clock, si at the 
second, and so on, a parallel automaton outputs v bits at each 
clock: (s , Si, . . . , s„_i) at the first clock, (s v , . . . , s 2v ^\) at 
the second, etc. More precisely a parallel automaton has v 



This matrix leads to a new representation of this RLFSM: outputs and products the sequences S l := (s 



nv+i )n£r 



where 
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•e] {j^^-H-^-H {^h^^n} 



m u \ — 



Fig. 7. A windmill with only feedforward connections. 

-X s + X 3 + X 2 + 1 



□ 



D 



^^(Jf 5 + X 3 + X 2 + 1) • X 



D 



X 5 + X 3 + X 2 + 1 



□ 



X 5 + X 3 + X 2 + 1 



Fig. 8. A windmill in rational representation. 



< i < v. Note that our study focus on characterizing the 
sequences S l and not the reconstructed sequence S. 

Consider the windmill presented in Figure [7] which is the 
one used in the stream cipher EO [5|. It is constituted of 
one vane of length 7 and three identical vanes of length 6. 
No feedback connection appears. Feedforward connections 
appear, for example from cell mi3 to cells mi2, toio, mg 
and m.7 . 

Until now, only windmill LFSRs with a single vane repeated 
several times have been studied. We generalize this definition 
allowing different vanes in a windmill. We also give a new 
description of this windmill which will be more compact. 
More precisely, using the example, we want to consider output 
sequences of cells too, rnr, TO13 and TO19, and characterize 
each vane by a polynomial. This leads to the interpretation 
presented in Figure [8] 

With this definition the LFSM described in Figure [8] as the 
following transition matrix: 



(X 5 + X 3 



X 1 



A) 





X 








1 

0/ 

and 



during 8 



We give in Table HI] the values of 
clocks. 

According to Definition 14.41 windmills as introduced by 
Smeets and Chambers [4| agree with the following definition: 

Definition 4.6: A windmill LFSR with polynomials 
a{X), fi(X) with /3(0) ^ and v vanes is an LFSR of length 



with matrix A over F2[[X]] of the form: 



/ 



a(X) 



X la 



(0) \ 



(0) 




a i x ) Yiv-2 





where < irj, • • • , iv-i- 

With this representation each row represents a vane 
of the windmill. In particular, as described in the fol- 
lowing section the length of the vane j is equal to 
max(deg(a(X)X^ ), deg(0(X))). 

By a straightforward calculus, we obtain det(7 — XA) — 

X n (a{X)/(3(X)) v + 1, where n = i H + i„-i. Set 

Q(X) = X n a(X) v + p{Xf, it becomes det(J - XA) = 
Q{X) j ' j5{X) v . The sequences M 4 observed in the output of 
this RLFSM are of the form Pi(X)/Q(X). The main result 
on windmill generators (c.f. flU) is tne f act that there exists a 
permutation a of {0, . . . , v — 1} such that the series S(X) = 
St(Si=n n r ii(t)X' 7 ^)X vt is a rational power series of the 
form P(X)/Q(X V ). In other words, a windmill generator is 
able to output in parallel at each iteration v consecutive values 
of a rational power series. The most interesting case is the 
one where Q(X V ) is a primitive polynomial. Such windmill 
generators are used in the specification of the pseudo-random 
generator EO included in the specifications of Bluetooth |5|. 

Our polynomial approach gives a more synthetic point of 
view on these windmill generators. In particular, it shows 
that the windmill properties (i.e. the parallel generation of a 
given m-sequence) is independent of the implementation of the 
vanes. This implementation can be made with Fibonacci vanes 
as in the original version, or with Galois vanes as presented 
previously or with ring vanes with better diffusion delay as 
we will see in the next section. 



F. Implementation of RLFSMs 

In our previous examples, the starting point was a binary 
circuit, or a RLFSM with a particular structure for its matrix. 
The converse problem is "how to construct an efficient imple- 
mentation from a given transition matrix A of a RLFSM". We 
will show on two examples that this task is not so easy. 

1) A first example: Consider the RLFSM C 1 defined by the 
following transition matrix: 



A 



-1=1 X3 + 1 



X 

X 2 +X + l 





We compute (I— XA) 1 to characterize the output sequences: 

(I-XA)- 1 



x 



x->+x 2 



x 4 +x 3 +i 
x l +x 

, A" 4 +A" 3 + l 



X i +X 3 + 1 
X 4 +A' 3 + l / 



Figure |9] presents an implementation of this automaton built 
upon three LFSMs. One for each nonzero coefficient in A. 
These LFSMs are built using a Galois vane architecture as 
presented in Figure [6] 
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X 4 + x 2 +x 


X 3 + X + 1 


X 2 + l 


4 











1 


X 5 + X 3 + X 2 + 1 


X 3 + X + 1 


X 4 + X + l 


X 


5 


1 


1 








X 4 + X 2 + X 


X 2 + l 


X 4 + X 3 + X 2 + X + 1 


1 


6 





1 


1 





X 5 + X 2 + X 


X 


X 3 +X 2 + X + 1 


X 4 + x 2 + x 


7 





1 


1 





X 5 + X 4 + X 3 + X 2 + X 


X 4 + X 2 + X + 1 


X 2 +X + 1 


X 3 + X + 1 
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1 


1 


X 5 + X 4 + X 


X 4 + X 3 + X 2 + 1 


X + l 


X 2 + 1 



TABLE II 

States of Figure[8]during 8 clocks. 




m 2 




mi 




m 







7715 



m4 



















I. 




) 



Fig. 9. First implementation of C 1 . 



mi 



mo 



m 3 



Note that, according to the notation of Figure [9] C 1 can be 
expressed as the LFSM (A',0,C) with: 



A' 



/o 1 0\ 

1 

1 

1 1 
1 1 

\o 1 1 0/ 



c" 



1 1 
1 



In particular, we have the following relations according to 
Theorem 13.31 



y(*) 



1 



X 4 + X 3 + l 
1 X X 2 X 3 + X 2 X 
X X 2 X 3 1 X 2 



1 X 2 

X X 3 - 



-X 
X 2 



,(*) 



This implementation is not optimal because it requires seven 
memories cells while four are enough (it outputs sequences 
of the form P(X)/(X i + X 3 + 1) with degP(X) < 3). 
In particular, det(7 - XA') = X 6 + X 3 + X 2 + X + 1, 
i.e., this automaton could output m-sequences of the form 
P(X)/(X e + X 3 + X 2 + X + 1) using a different matrix 
C because X 6 + X 3 + X 2 + X + 1 is primitive. 

A better implementation is given considering one LFSM per 
line. To do so, note that X i+ X+1 = jrfef • This leads to the 
implementation presented in Figure |T0j 

As previously this leads to the relation: 



( 



X 

X 4 +X 3 + l 



X 

X A +X :i + 1 

X 2 
X A +X :i + 1 



X 2 
X 4 +X a + 1 

X 3 
X 4 +X a + 1 




Fig. 10. Second implementation of C 1 . 

2) Second example: Consider the RLFSM C 2 defined by 
the following transition matrix: 



A = 



x+i 

X 3 +X+l 

X 3 + X 2 





Figure QT| presents an implementation of this automaton 
built upon six LFSMs. One for each nonzero coefficient in 
A. These LFSMs are built using a Galois vane architecture as 
presented in Figure [6] 

Note that, according to the notation of Figure [TT] C 2 can 
be expressed as the LFSM (A',0,C) with: 



A = 




1 
1 






1 
1 







V 

















1 

J 



and 



00000000000 



100 

0000010001001 
0000000000000 



This implementation is not optimal because it requires 
fifteen memories cells while nine are enough because 
deg(det(7 - XA)) = 9. In particular, deg(det(7 - XA')) = 
11. 



10 








mio 











"L|mi2 









1 







Fig. 11. First implementation of C . 



mg 




mi 




m 6 






m 5 




















Fig. 12. Second implementation of £ . 



A better implementation is given considering one LFSM per 
line. This leads to the implementation presented in Figure [12] 

This implementation is still not optimum because it requires 
eleven memory cells. This comes from the fact that in the 
matrix A, two terms with identical denominator appears in the 
same column: X 2 +x+x an ^ x^+x+i ■ More precisely, det(J— 
XA') = (X+1){X 2 +X+1)(X S +X 7 +X 5 +X 4 +X 3 +X 2 + 



1). Thus, the automaton could be implemented using the nine 
cells equivalent with the polynomial (X + l)(X s + X 7 + X 5 + 
X 4 + X 3 + X 2 + 1) which is reducible and thus not primitive 
whereas the last factor disappears inside the automaton itself. 



G. A practical example of application 

The rational representation is a theoretical tool that provides 
a global view on the LFSRs design, as seen for the case of 
windmill generators. However, previous examples have shown 
that starting from a circuit under rational representation to 
obtain an optimal implementation is not a simple task. 

In the example given here, we generalize the windmills 
generators through particular series circuits. We limit our study 
with an example built on 3 circuits but the generalization of 
this method is straightforward. 

Let A X {X) = P 1 {X)/Q 1 (X), A 2 (X) = P 2 (X)/Q 2 {X) 
and A 3 (X) = P 3 (X)/Q 3 (X) be 3 elements of Q. We 
consider the rational LFSR with transition matrix 



T : 



A x 
A 2 
A 3 



We have det(J - XT) = 1 - X 3 A X A 2 A 3 = 
Q(X)/(Qx(X)Q 2 (X)Q 3 (X)) with Q(X) 
Qx{X)Q 2 {X)Q 3 {X) + X 3 Q 1 (X)P 1 (X)P 2 (X)P 3 (X). 
The associated automaton computes rational series of the 
form P(X)/Q(X). 

Following the examples introduced in Figure [5] and in 
Section H^U we choose A X {X) = A 2 (X) = (X 6 + X 3 + 
1)/(X 8 + X 6 + X 5 + X 3 + 1) and A 3 (X) = (X 7 + X 5 + 
X 4 + X 2 )/(X S + X 6 + X 5 + X 3 + 1). The connection 
polynomial (i.e. the numerator of det(J — XT)) is Q(X) = 
X 24 + X 2X + x w + x 9 +X 7 + X 3 + 1. This polynomial is 
primitive, so the automaton will produce rn-sequences. 

For a practical implementation, we can replace the Galois 
vanes associated to Ax(X), A 2 (X) and A 3 (X) by the ring 
vanes presented in Section HV-AI 

This leads to a classical binary LFSR with transition matrix 



T. 



Where T 2 is the 8x8 matrix of the ring LFSR given 
in Figure H and where Eij is the 8x8 matrix with only 
one 1 in position (i,j). The matrices represent the 

connections between the 3 circuits. For example, the matrix 
Ei, 4 corresponds to the input 1 of the first ring LFSR and the 
output 4 of the third LFSR. 

Note that det(7 - XT r ) = Q(X) = X 24 + X 21 + X 16 + 
X 9 + X 7 + X 3 + 1. 

Suppose now that we prefer an implementation with Galois 
vanes as internal blocks. The matrix of the Galois vane is the 
matrix To given in Figure |4] The multiplication by A\{X) 
is performed using l B\ — (1,0,0,1,0,0,1,0) in input and 
C\ — (1,0,0,0,0,0,0,0) for output. In the same way, we 
obtain B 2 = B x , *B 3 = (0, 0, 1, 0, 1, 1, 0, 1) and C 2 = C 3 = 
C\. So the equivalent binary circuit is then 



BxC 3 
T 
B 3 C 2 Tq 





11 



As we will see in the next section the automaton correspond- 
ing to the matrix T r has many nice properties compared to the 
classical ones obtained from the Galois LFSR. In particular, 
it needs 9 connections compared to 19 for the second one. 

This example shows that the rational representation allows 
to separate the global design of the automaton from the choices 
of the hardware (or software) implementation. 

The method presented in this example can be directly 
generalized to all Windmill generators and potentially leads 
to better practical implementations. 

V. Design of efficient LFSRs for both hardware 

AND SOFTWARE CRYPTOGRAPHIC APPLICATIONS 

In this section, we specialize our work on autonomous 
LFSMs, in particular on LFSRs and their dedicated use for 
cryptographic applications. 

A general purpose of cryptography is to design primitives 
that are both efficient in hardware and software because such 
primitives must run on all possible supports, from RFID tags 
to super-calculators. Thus, cryptographers must keep in mind, 
when they design cryptosystems, the very wide range of targets 
on which cryptosystems must be rapid and efficient. As proof, 
the Rijndael algorithm chosen as the AES |fl9ll in 2001 was 
one of the more efficient algorithm in hardware and in software 
among the finalists of the AES competition. 

Thus, designing well-chosen dedicated LFSMs efficient 
both in hardware and in software has direct consequences on 
the celerity of the cryptosystems which use such primitives 
as building blocks. Among cryptographic primitives that use 
LFSMs, we could cite the most famous case: the stream 
ciphers. Many stream ciphers - such as E0 O, SNOW Q 
or the finalists SOSEMANUK [20] and Grain vl EQ of 
the eStream project ll22ll - filter the content of one or many 
LFSMs to output pseudo-random bits. LFSMs could also be 
used as diffusion layer of a block cipher as proposed in l23ll . 
More recently, in [24], a particular LFSM combined with 
two NLFSRs (Non-Linear Feedback Shift Registers) has been 
proposed at CHES 2010 as the building block of a lightweight 
hash function named Quark. Well designing LFSMs with good 
criteria is therefore crucial for symmetric key cryptography. 

In this section, we first introduce the required design criteria 
that must be fulfilled by an LFSM when used in cryptographic 
applications. We then extend the traditional concept of diffu- 
sion (well-known in the block cipher context) to the case of 
LFSMs. This leads to define a new criterion for good LFSMs 
choices for cryptographic applications which is defined as the 
counterpart of the Shannon diffusion concept |25l . 

Then, we present previous works on LFSMs for hardware 
and software cryptographic applications. These automata have 
been widely studied (TJ, 0, Jffl, QUI, (26|, (H and practical 
constructions have emerged. We finally propose an efficient 
construction dedicated to hardware and a second one dedicated 
to software. This software construction is also efficient in 
hardware. 

A. Design criteria 

We focus our design analysis on two important properties. 
The first one characterizes the kind of sequences that are 



required for cryptographic applications whereas the second 
one tries to formalize the notion of diffusion delay in the 
context of LFSRs. 

1) m-sequences: As introduced in Section ILTl m-sequences 
are particular linear recurring sequences with good properties 
01, liTOl . For example, we give some properties for re- 
sequences of degree n over F2: 

• an m-sequence is balanced: the number of 1 is one greater 
than the number of (considering one period). 

• an m-sequence has the run property: a run is a sub- 
sequence of 1 or followed and followed by or 1. 
Half of the runs are of length 1, a quarter of length 2, an 
eighth of length 3, etc. up to the 1-run of length n. 

• an m-sequence is a punctured De Bruijn sequence. 

• an m-sequence has the (ideal) two-level autocorrelation 
function where the autocorrelation function for a binary 
sequence a is defined as C a (r) = 2~2iLo (— l) ai+T+ai 
where N is the period of the sequence. This function 
verifies for a m-sequence: C T = N if r = mod N 
and C T = K if r 7^ mod N (where if is a constant 
equal to —1 if N is odd and to is even). 

• an m-sequence has maximum period: an m-sequence 
verifying a linear relation of degree n has a period of 
2" - 1. 

In the sequel, we are specially interested in LFSMs having 
a primitive connection polynomial and producing m-sequence 
which are the ones classically used in cryptography. In par- 
ticular, all our examples satisfy this condition. However, most 
of the results remains true without this hypothesis. 

2) Diffusion delay: The concept of diffusion for a cipher 
was introduced by C. Shannon in [25 1 as the dissipating effect 
of the redundancy of the statistical structure of a message 
M. This concept is directly linked with the Avalanche effect 
defined by H. Feistel in ll27ll which is a desirable property 
of cryptographic algorithms, typically block ciphers and cryp- 
tographic hash functions. The Avalanche effect means that if 
an input is changed slightly, the corresponding output must 
change significantly. In the case of block ciphers, such a small 
change in either the key or the plaintext should cause a drastic 
change in the ciphertext. 

Two precise notions could be directly derived: the strict 
avalanche criterion (SAC) and the bit independence criterion 
(BIC). The strict avalanche criterion (SAC) is a generalization 
of the avalanche effect. It is satisfied if, whenever a single input 
bit is complemented, each of the output bits changes with a 
50% probability 11281 . The bit independence criterion (BIC) 
states that output bits j and k should change independently 
when any single input bit i is inverted, for all i, j and k. 

When focusing on m-sequences, the measure of diffusion 
capacity is usually studied through the notions of correla- 
tion, auto-correlation and cross-correlation (see [29] for more 
details). The correlation of two binary m-sequences a = 
(ai,---a n ) and f3 — (bi,---b n ) is measured as C(a,/3) = 
^(A — D) where A is the number of times for i from 1 to n, 
that <2j and hi agree and D is the number of times that eij and 6j 
disagree. The auto-correlation of a given binary sequence has 
already been defined in the previous subsection. It represents 
the similarity between a sequence and its phase shift. The 
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cross-correlation is defined as C aj/ g(r) = ^2 i=0 {— l) ai + r+bi 
when q = 2 for two periodic binary sequences a of period s 
and /3 of period t with = lcm(s, t) (for the case q > 2 the 
reader could refer to |29l ). 

Thus, in this part, we introduce a slightly different definition 
of diffusion of an LFSM to more precisely capture the behavior 
of the beginning of a sequence. This parameter measures the 
time needed to mix the content of the cells of an automaton. 
It could be expressed as the minimal number of clocks needed 
such that any memory cell has been influenced by any other. 

Definition 5.1: Let £ = (A, 0, C) be an LFSM. Denote by 
G the graph defined by the adjacency matrix A*, i.e., if aij ^ 
then there exists a directed edge from vertex j and to vertex 
i. The diffusion delay is equal to the diameter of G. 

This parameter does not focus on the output sequence of an 
LFSM but on the sequences produced INSIDE the register it- 
self (i.e. we look at the sequences (mo(i), ■ ■ ■ , m n _i(£), • • • )) 
and thus is relied on the implementation of the automaton. 

In a general point of view, if we take a random graph 
with n vertices, the average value of its diffusion delay is 
yfn as shown in |3Q| . For a complete graph, the diffusion 
delay parameter is optimal and is equal to 1, however complete 
graphs do not produce good sequences as the corresponding 
determinant det(J— AX) (where A is the matrix representation 
of the complete graph) is equal to X + 1 if n is odd and 1 
otherwise and thus could not produce sufficiently large re- 
sequences. Moreover, for a complete graph, from the circuit 
point of view, as the matrix of such graph as n 2 non-zero 
terms, this means that the representation circuit has n 2 — n 
xors. In the same way, the required number of xors for a 
circuit representing a random graph is about n 2 /2. But, for 
cryptographic applications with efficient implementations, we 
look at circuits with good properties and with about n/2 xors 
which correspond with matrices with a binary weight equal to 
3n/2. Thus, we are far from circuits of complete or random 
graphs. 

So, we want to limit our study on lowering the diffusion de- 
lay when considering large m-sequences. More precisely, our 
aim in this section is double: we want to propose LFSRs that 
produce large m-sequences with an efficient implementation 
and with a low diffusion delay. 

Let us explain now why it is important in cryptographic con- 
text to lower diffusion delay. This criterion aims at evaluating 
the speed needed to completely spread a difference into the 
automaton. More precisely, when considering an LFSM of size 
n with a diffusion delay S. Replacing the content of a cell m\ 
by mf + 1 may influence any cell rrij with < j < n after 5 
clocks. It could also be expressed in terms of correlation: after 
5 clocks, the behavior of any cell is correlated with any other. 
More precisely, consider the two following sequences: the first 
sequence a — (ai, • • • , ajv) is a binary sequence of the states 
of the content of the register of an LFSR initialized with an 
n-bit word a\ (i.e. each element a,i of a is the content at 
time i of the LFSR and is n-bit long). The second sequence 
of same length N, (3 = (6i , - - - , 6jv)> is constructed in the 
same way with an initialization hi that differ from a\ on a 
single bit position. Then, C(a, f3) is lowered by the LFSR 
with the smaller diffusion delay for small values of N (we 



have compared the results obtained for three LFSRs of length 
n = 12 bits (a Galois one, a Fibonacci one and a Ring one) 
and correlation values until N = 256). Note that the effect 
of a small diffusion delay could only be observed for small 
values of N because after more clocks the influence of each 
modified bit is complete whatever the value of the diffusion 
delay of the considered LFSR. 

For example, considering Galois, Fibonacci LFSRs and 
Cellular automata of size n, the associated diffusion delay is 
n — 1 because the cells on each side niQ and rn„_i require 
n — 1 clocks to mix together. In the other hand, Ring LFSRs 
allow to lower this parameter as its associated graph is closer 
to a random graph, and as the expected value of the diameter 
of a random graph with n vertices is y/fi. Ring LFSRs achieve 
a better diffusion delay. However, in practice, this value is an 
average that could not be always reached especially because 
we also focus our design choices on Ring LFSRs with sparse 
transition matrix, i.e., we will consider graphs with few edges. 

This diffusion delay criterion may be important for cryp- 
tographic purpose where small differences in keys or in 
messages are required to have a large impact. It may also be 
useful to lower the dimension gap for Pseudo Random Number 
Generators as presented in BP . l26l . Hence, the dimension 
gap lowers when an RNG outputs uniformly distributed point 
in a given sample space. 

Moreover, this diffusion delay criterion could also be im- 
portant, in stream cipher design, to determine the number of 
clocks required by the so called initialization phase and to 
speed up this step. Indeed, a stream cipher is composed of 
two phases: an initialization phase where no bit are output 
and a generation phase where bits are output. The initialization 
phase aims at mixing together the key bits and the IV bits. 
Thus, a lower diffusion delay allows to speed up this mix 
in terms of number of clocks. For example, the F-FCSR v3 
stream cipher proposed in ll32~l based on a ring FCSR with a 
diffusion delay equal to d has an initialization phase with only 
d + 4 clocks for mixing purpose whereas the previous version 
of the F-FCSR family (F-FCSR v2) is based on a Galois FCSR 
and thus requires n + 4 clocks in the initialization step where 
n is the length of the considered FCSR. Thus, as d < n, a 
ring FCSR with a "good" (i.e. low) diffusion delay allows 
to improve the general throughput of the stream cipher by 
speeding up the initialization step. 

As previously suggested by the example concerning FCSRs, 
because the diffusion delay criterion introduced in this section 
is essentially linked with the graph of the automaton whatever 
the considered graph, then the diffusion delay criterion could 
be applied for all possible automata: LFSRs, NLFSRs or 
FCSRs. For example, the FCSR used in the stream cipher 
F-FCSR v3 is a ring FCSR which has replaced a classical 
Galois FCSR. This modification leads to halve the number of 
required clocks during the initialization step and to completely 
discard the attack of Hell and Johannson ll33ll against F-FCSR 
v2 due to a better internal diffusion delay. 

B. Efficient hardware design 

We show in this subsection how to achieve good hardware 
design and we first introduce the constraints required to 
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achieve such a design: 



transition matrix of the form 



« Critical path length: The shorter longest path must be as 

short as possible to raise frequency. 
> Fan-out: A given signal should drive minimum gate 

number as exposed in |[T4l . 
• Cost: The number of logic gates must be as small as 

possible to lower consumption. 

We focus on these parameters because lowering these values 
allows to increase the frequency of the automata, consequently 
it allows to increase the throughput. 

1) Previous works: Previous works have been done to 
lower those parameters. For example, in [34| the authors 
proposed top-bottom LFSR: a Ring LFSR divided in two 
parts: a Fibonacci part and a Galois part corresponding with 
a transition matrix of the form: 



/ 9i 1 
92 1 



(0) 



9i-i 



(0) 



V i 



fi fi 



+ 1 



fj 



This approach is a trade-off between Galois and Fibonacci 
LFSRs. In particular, given a polynomial, there exists a top- 
bottom LFSR with this connection polynomial. The critical 
path length, the fan-out and the cost may thus be an average 
between the Galois and the Fibonacci cases. But this construc- 
tion also carries the disadvantages of both cases, for example 
a slow diffusion delay. 

In iflT I. the authors proposed a method that constructs, from 
a given LFSR, a similar LFSR with a lower critical path 
length and a lower fan-out. To do so, they modify step by 
step the transition matrix of the original LFSR using left and 
right shifts without modifying the corresponding value of the 
connection polynomial. For a given connection polynomial, 
those constructions lead to implementations with a critical 
path of length at most 2, a fan-out of at most 3 and a 
constant cost when starting the algorithm using a Galois 
LFSR. More precisely, their method behaves well on polyno- 
mials with uniformly distributed coefficients, i.e., polynomials 
with the same separation between any two consecutive non- 
zero coefficients. They give as an example the polynomial 

X 72 + X 64 + X 55 + X 45 + X 37 + X 27 + ^18 + ^ 9 + 1, 

compared to X 72 +X 4S +X 6 +X 5 +X i +X 3 +X 2 +X 1 + l. 
In summary, their method leads to consider Ring LFSRs with 
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(0) 
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hi 



V 1 



h n -2 
h n -i 



h n -4 
h n -3 



(0) 



/ 



for the connection polynomial X n +h n -iX n ~ 1 + - ■ -+hiX+l 
and n odd (the form is similar for n even). 

The authors also give a generic method (using two other 
elementary transformations called SDL and SDR that preserve 
the connection polynomial) to lower the hardware cost of 
an LFSR. To reach an LFSR with a better cost, the authors 
must apply their method step by step until a x-or operation 
is reached using their algorithm. The point of view taken in 
this article is thus from a given connection polynomial and a 
given transition matrix to reach a better form of the transition 
matrix (and thus a better hardware implementation) keeping 
the same connection polynomial. The proposed methods are 
based on looking at similar LFSRs. However, from a given 
LFSR, all the possible similar LFSRs could not be reached 
using their algorithms. The corresponding diffusion delay of 
this kind of LFSRs is about n/2. We show in the different 
examples given in this Section that we could reach a better 
diffusion delay jointly with a more compact implementation. 

2) Our approach: Moreover, in most of the applications, 
the designer does not care about which connection poly- 
nomial is chosen for the LFSR but only needs to know 
that the connection polynomial is primitive. This is the core 
of our approach and of our proposal where we randomly 
pick transition matrices with desired properties (that could be 
application-dependent) and a posteriori verify if the obtained 
connection polynomial is primitive or not. To do so, we first 
need to express the previous required constraints relying on 
the transition matrix of a Ring LFSR. Table [ill] sums up 
those constraints using the following notations: denote by £ a 
Ring LFSR of length n with transition matrix A. We compute 
its connection polynomial Q(X) and consider the associated 
Galois LFSR Lq and Fibonacci LFSR Hp- We denote by 
colo, . . . , col„_i the columns of A and rowo, . . . , row„_i its 
rows. We note w :— wh{Q{X)). All the presented constraints 
will be taken into account in our approach in order to reach 
an LFSM that satisfies all the requirements. 

Galois LFSRs are optimal for the critical path, while Fi- 
bonacci LFSRs are optimal for the fan-out. A Ring LFSR can 
be built to reach these two values. More precisely a Ring LFSR 
with a Hamming weight of at most 2 for its columns and its 
rows will have an optimal critical path and an optimal fan-out 
with a good diffusion delay as summed up in Table [Till 

However, we do not have an algorithm that construct an 
LFSR with a given connection polynomial, we just can pick 
random transition matrix with good properties. Hence, as we 
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Galois 


Fibonacci 


Cellular automaton 


Ring LFSR 


LFSR of |17 1 


Critical path 


1 


|log 2 («,-l)| 


2 


max|log 2 («>ff(rowi))| 


2 


Fan-out 


w — 1 


2 


3 


max «>£f (coli ) 


3 


Cost 


w - 2 


u> -2 


n 


w H (T)-n 


w - 2 


Diffusion delay 


n - 1 


n- 1 


n — 1 


< n- 1 


n/2 



TABLE III 

Critical path, fan-out, cost and diffusion delay of Galois LFSRs, Fibonacci LFSRs, Cellular automata, generic Ring LFSRs and 

CONSTRUCTION PROPOSED IN ifTTl . 



Require: n the length of the Ring LFSR to seek. / < n the 

number of feedbacks to place. 
Ensure: A transition matrix A with a critical path of length 

1, a fan-out of 2 and a cost of / logic gates and such that 

its connection polynomial is primitive of degree n. 

repeat 

1 if j = i + 1 mod n 
otherwise 



A «- {a i> j)o<i,j< n with a itj 



while wr [A) < n + f do 

<— Random([0,n] x [0, n]) 
if Wff(rowj) = 1 AND w# (col,) 

a i,j <~ 1 
end if 



1 then 



end while 

Q(X) 4- det(J - XA) 
until Q(AT) is primitive 
return A 

Fig. 13. Algorithm to pick randomly a Ring LFSR with a good hardware 
design. 



allow the connection to be freely chosen, the constructed 
matrices do not present any special form allowing to compute 
efficiently the connection polynomial. Moreover, when con- 
sidering LFSMs in practice, the constraint on the connection 
polynomial is simply to be primitive, not to have a particular 
value. 

Algorithm [T3l picks random feedbacks positions and com- 
putes the associated connection polynomial. This algorithm 
is probabilistic. We expect picking a random matrix of size n 
and computing its connection polynomial is equivalent to pick 
a random polynomial of degree n. More precisely we know 
that the connection polynomial as its constant coefficient and 
its greatest coefficient equal to 1, so the number of possibly 
constructed polynomials is 2™~ 2 . The number of primitive 



polynomials of degree n over F2 is 



where <p is the 



Euler function. We expect Algorithm [13] to be successful after 
y(2"-i)/n tr ' es as presented in Fig. [14] 

The time complexity of this algorithm is driven by the time 
it takes to compute det(J — XA) which is roughly 0(n 3 ). 

For a hardware oriented LFSM, each feedback can be freely 
placed. Using this property we can lower the complexity of 
the previous algorithm using intermediate computations done 
using the cofactors of the matrix A as follows: 

Proposition 5.2: Given a matrix A over a ring R of size 
n x n. Note Eij the matrix with a single 1 in position 
Then we have det(A+XEij) = det(A)+X cofij where coiij 




120 140 



Fig. 14. Theoretic and empirical 



of trials needed for Algorithm 1 131 



denotes the (i, j)-th cofactor of the matrix A. 

The cofactors matrix of a matrix is equal to the transpo- 
sition of its adjunct matrix, which could be computed with 
classical inversion algorithms. Using the previous proposition, 
we are able to improve the complexity of our algorithm using 
Algorithm [15] 

The complexity of this algorithm is driven by the com- 
putation of the cofactors matrix and its determinant which 
can be achieved by a common algorithm. Each computation 
of cofactors matrix costs 0(n 3 ) operations. With a single 
cofactors matrix, we test roughly n 2 — nf polynomials. So 
the average complexity is about 0(n) operations. 

3) Example: We give in Appendix [A] an example of a 
hardware oriented LFSR of length 128 found using Algorithm 
[TBI This LFSR has a primitive connection polynomial which 
has an Hamming weight of 65. The diffusion delay of this 
LFSR is only 27 whereas the corresponding diffusion delay 
for a Galois or a Fibonacci LFSR would be 127. 

C. Efficient software and hardware design 

In the previous subsection, we focus our work on an efficient 
algorithm to find efficient LFSRs for hardware design. In this 
subsection, we will show how we could adapt those results for 
efficient software design of an LFSR and show how this design 
is also efficient in hardware. The main difference between 
hardware and software is the atomic data size. In hardware 
we operate on single bits, whereas in software bits are natively 
packed in words such that working on single bits is not natural 
and needs additional operations. The word size depends on the 
architecture of the processor: 8 bits, 16 bits, 32 bits, 64 bits 
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Require: n the length of the Ring LFSR to seek. / < n the 

number of feedbacks to be placed. 
Ensure: A transition matrix A with a critical path of length 

1, a fan-out of 2 and a cost of / logic gates and such that 

its connection polynomial is primitive of degree n. 

loop 

(n \ , „,uu „ _/lifj=t + l mod n 



A 



with a,ij = 



otherwise 



while wh (A) < n + f - 1 do 

Random([0,n] x [0, n]) 
if wjy(rowi) = 1 and wjf(colj) — 1 then 



end if 



<- 1 



end while 

C cofactors matrix of / 
Q Q (X) <r- det(I-XA) 
for < i,j < n do 

if wjy(rowi) = 1 and ui#(coL;) 
Q(X) <- Q (X) - Xdj 
if Q(X) is primitive then 

Break 
end if 
end if 
end for 
end loop 
return A 



1 then 



Fig. 15. 
design. 



Algorithm to pick randomly a Ring LFSR with a good hardware 



or more. To benefit from this architecture we propose to use 
LFSRs acting on words. Let us first summarize the previous 
works that have been done to optimize software performances 
of LFSRs. Then, we introduce our construction method to 
build LFSRs efficient in software and in hardware. 

1 ) Previous works: Firstly, the Generalized Feedback Shift 
Registers were introduced in 11351 to increase the throughput. 
The main idea here was to parallelize w Fibonacci LFSRs. 
More formally, the corresponding matrix of such a construc- 
tion is: 



/0 



A = 







(0) 



\ 



(0) 



In, 

a^Iw a\I w aoI w ) 



where I w represents the w x w identity matrix over F2 and 
where the <jj for i in [0, .., n — 2] are binary coefficients. The 
matrix A could be seen at bit level but also at w-bits word 
level, each bit of the w-bits word is in fact one bit of the 
internal state of one Fibonacci LFSR among the w LFSRs. 

In 0, Roggeman applied the previous definition to LFSRs 
to obtain the Generalized Linear Feedback Shift Registers but 
in this case the matrix T is always defined at bit level. In 1992, 
Matsumoto in [36| generalized this last approach considering 
no more LFSR at bit level but at vector bit level (called word). 



This representation is called Twisted Generalized Feedback 
Shift Register whereas the same kind of architecture was also 
described in |j3~7l and called the Mersenne Twister. In those 
approaches, the considered LFSRs are in Fibonacci mode seen 
at word level with a unique linear feedback. The corresponding 
matrices are of the form: 



/0 



A = 



\ 



\Iu 



Iu 



(0) 



L 



(0) 



0/ 



where I w represents the wxw identity matrix and where L is 
a w x w binary matrix. In this case, the matrix is defined over 
F2 but could also be seen at w-bits word level. This is the 
first generalization of LFSRs specially designed for software 
applications due to the word oriented structure. 

The last generalization was introduced in 1995 in ll38l 
with the Multiple-Recursive Matrix Method and used in the 
Xorshift Generators described in [39| and well studied in |26|. 
In this case, the used LFSRs are in Fibonacci mode with 
several linear feedbacks. The matrix representation is: 
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.4 







(0) 



y-A-r A r _i A r — 2 



(0) 
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It 



where I w is the identity matrix and where the matrices A^ are 
software efficient transformations such as right or left shifts 
at word level or word rotation. The main advantage of this 
representation is its word-oriented software efficiency but it 
also preserves all the good LFSRs properties if the underlying 
polynomial is primitive. Moreover, using the special form of 
the transition matrix, the connection polynomial is efficiently 
computed with the formula P{X) = det (i + Y,j=i x ° ^j) ■ 

A particular case of the Multiple-Recursive Matrix Method 
is studied in ll40l . The authors proposed to consider matrices 
Ai of the form a* • T where T is a square matrix of size 
w, and at are scalar elements. In this case, an algorithm to 
construct LFSMs with primitive polynomials is given. This 
paper was the first to introduce efficient word-oriented LFSRs, 
thus solving the challenge proposed by Bart Preneel in fill . 

An other way to construct software oriented LFSRs is to 
consider LFSRs over F 2 ™ as done in Q, EOJ. The SNOW 
LFSR is given in Appendix |B] This interpretation allows to 
use table-lookup optimization and gives good results. Those 
automata could be interpreted as linear automata over F 2 
because of the mapping ¥2^ — > (F2)' 1 ". In particular, they 
can be consider as a special case of our proposal. 

2) Our proposal for building LFSRs efficient in software 
and in hardware: As for the hardware case our approach 
focuses on the construction of a software oriented transition 
matrix. To do so, we will use transition matrices defined by 
block. In the next algorithm, A will define a block matrix, i.e., 
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with a 



Require: k the word size, n the length of the LFSR to seek 

with k\n. f < n/k the number of word-feedbacks to place. 
Ensure: A transition matrix A define by block with a cost 

of / shift and xor operations and such that its connection 

polynomial is primitive of degree n. 

repeat 

j<n/k 

Ik if j = i + 1 mod n/k 
otherwise 
From 4— i?emG?om([0, n/k]') 
To <- i?andom([0,n/fc] / ) 
STn/i <- Random (([-ft/2, k/2] \ {0}) / ) 
for Z to / - 1 do 

a To[I].From[I] <~ a To[i] ,From[I] 

L sw/t[i] if 5/j^ t ^] > o 
ft-Shift[i] otherwise 

end for 

Q(X) <- det(J - XA) 
until Q(X) is primitive 
return A 

Fig. 16. Algorithm to pick randomly an LFSR with a good software design. 



A is taken in A4 n / k (A4k(^2)) for a matrix of size n divided 
in blocks of size k over Fa. When an LFSR is being defined 
by block, we call it a word-LFSR. 

Moreover we will use the right and left shift operations 
(denoted 3> and -C) which are fast and implemented at word 
level. Given a word size k we define the matrix L of left shift 
as the matrix k x k with ones on its overdiagonal and zeros 
elsewhere. Similarly, the matrix R of right shift is defined 
as the matrix k x k with ones on its sub-diagonal and zeros 
elsewhere, such that we have: 



/ h R 1 
h 



A 



L ■ (x ,x 1 ,...,x k -i) t 
R- (x ,x 1 ,...,x k -i) t 



(xi,. . .,x k -i,oy 

(0,X ,Xi,...,Xk-2Y 



Remark that LFSRs over F2™ can be expressed as word- 
LFSRs where used operations are multiplications on ¥2^ seen 
as a space vector over F2, i.e., there exists a bijection between 
F 2 « and (F 2 ) w . 

According to the previous discussion we propose Algorithm 
[161 to build efficient software LFSRs. 

This algorithm picks random word-feedbacks positions and 
shift values, and computes the associated connection polyno- 
mial. The complexity of this algorithm is about the same than 
Algorithm [T3l because we have not been able to use the block 
structure of the matrix to lower the determinant computation 
complexity. 

3) Example: We give in Figure [17] an example of an LFSR 
with an efficient software design with n = 40 and k — 8 and a 
primitive connection polynomial. The corresponding hardware 
implementation of this LFSR is also very good due to its 
intrinsic structure (a fan out of 2, a critical path of length 1 and 
a cost of 19 adders) and because it fulfills the requirements of 
Alg [B] The diffusion delay of this LFSR is 27. 

Let us now also compare a word oriented LFSR picked 
using our algorithm to the SNOW2.0 LFSR defined in Q. 



h 



\h L 1 

(a) Transition matrix 



-CUD 




(b) Representation 
Fig. 17. An LFSR with efficient software design. 

The two LFSRs are respectively described in Appendix iBl and 
in Appendix ICl 

These two LFSRs output m-sequences of degree 512. We 
compare the diffusion delay and the throughput in software 
for those two LFSRs: 

. The diffusion delay of the SNOW LFSR is 49 compared 

to 33 for our LFSR. 
• The cost of one clock is 8 cycles for the SNOW LFSR 
using the sliding window implementation as proposed in 
Q (this technique could be only applied for a Fibonacci 
LFSR). The cost for this LFSR implemented using clas- 
sical implementation is 20 cycles. The cost for our LFSR 
is 33 cycles. 

As presented the diffusion delay is better for our LFSR. 
However, the cost of one clock is higher in our case. This 
is due to the fact that the SNOW LFSR is sparse (three 
feedbacks) while ours has 8 feedbacks. Moreover, the com- 
putations are made using precomputed tables which leads to a 
better cost. However, the hardware implementation of our own 
LFSR has a really low cost (it fulfills the hardware design 
criteria we require in the previous section: critical path of 
length 1, fan-out of 2) whereas the SNOW2.0 LFSR could not 
be efficiently implemented in hardware due to the precomputed 
tables. 

D. Conclusion 

To sum up the results given in this section, we have 
proposed two algorithms one for hardware purpose, one for 
software purpose that allow to build efficient LFSRs with a low 
diffusion delay and good implementation criteria. Moreover, 
building an LFSR using Alg. [16] leads to an LFSR with good 
cryptographic properties with an efficient implementation both 
in software and in hardware. 

VI. Conclusion 

In this paper, we have shown how to link together matrix 
representations and polynomial representations for efficient 
LFSMs, LFSRs and windmill LFSRs constructions. Those 
new representations lead to efficient implementations both in 
software and in hardware. We have compared new Ring LFSR 
constructions with LFSRs used in several stream ciphers and 
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we have shown that Ring LFSRs have always a better diffusion 
delay with better hardware performances and good software 
performances. 

In further works, we aim at more precisely looking at the 
case of an LFSM with £ output bits to give equivalent and 
general representations. We also want to generalize those new 
results to Finite State Machines that are no more linear. The 
same kind of generalization could be efficiently applied to 
Feedback with Carry Shift Registers (FCSRs) or to Algebraic 
Feedback Shift Registers (AFSRs). 
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Appendix 

A. Example of a Ring LFSR of size 128 bits 

We describe a Ring LFSR of size 128 bits. The transition 
matrix A — (fljj) is given by: 

a M+ i = 1 for all < i < 127 

a 127,0 = 1 

ctij = 1 for G T 
where T is the set: 



(4 78) 


ffi 19) 


(8 44) 


(9 106) 


(10 70) 


(12 14) 


(14 115) 


(15 55) 


(17,82), 


(21,64), 


(22,12), 


(25,127), 


(27,107), 


(28,112), 


(31,59), 


(34,111), 


(35,48), 


(37,36), 


(38,23), 


(39,88), 


(43,37), 


(44,26), 


(46,60), 


(47,100), 


(49,24), 


(50,25), 


(51,2), 


(51,27), 


(55,124), 


(57,113), 


(59,71), 


(61,29), 


(69,123), 


(72,52), 


(73,118), 


(77,46), 


(80,74), 


(81,83), 


(83,98), 


(87,53), 


(88,73), 


(91,47), 


(93,10), 


(94,21), 


(95,93), 


(97,13), 


(98,117), 


(99,50), 


(100,3), 


(101,104), 


(104,1), 


(105,114) 


(106,108), 


(107,105), 


(109,4), 


(111,28), 


(112,68), 


(113,42), 


(114,31), 


(119,18), 


(120,49), 


(121,32), 


(123,94), 


(124,6) 



This LFSR has a primitive connection polynomial. It has a 
cost of 64 adders, a fan-out equal to 2 and a critical path of 
1, and a diffusion delay of 27. 

B. Description of the LFSR in SNOW 2.0 over F 2 

We give here a description of the LFSR used in SNOW 2.0 
seen as a LFSR over F 2 . 

First this LFSR is defined as a Fibonacci LFSR over F 2 32. 
The field F 2 32 is defined as an extension of F 2 s to allow 
an efficient implementation and to prevent the guess-and- 
determine attack presented in [42|. 

The implementation is based upon the multiplication by a G 
F 2 32 satisfying a- (c3a 3 + c 2 a 2 +C10 1 +c ) = (c 2 a 3 + cia 2 + 
Coa) + C3 • V with V an element in F 2 32. We denote M a the 
matrix of this linear application seen over F^ 2 : 
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(0) 
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(0) 
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1 



where 

Vo = *(0xE19FCF13) 
Vx = *(0x6B973726) 
V 2 = '(0XD6876E4C) 
V 3 = '(0x05A7DC98) 
V A = *(0x0AE71199) 
V 5 = '(0x1467229b) 
V 6 = '(0x28CE449F) 
Vj = '(0x50358897) 

Then the transition matrix of the LFSR of SNOW2.0 is 
presented in Figure [18] 
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Fig. 18. Transition matrix of SNOW2.0 
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Fig. 19. Transition matrix of a word oriented LFSR 



C. Example of a word-oriented LFSR of size 512 bits 

We give in Figure [19] a description of a word-oriented LFSR of length 512 with words of 32 bits. The grid in the matrix is 
drawn for readability. 



